Learning Subcategorization Frames from Corpora: a Case Study for Modern Greek
نویسندگان
چکیده
Certain Natural Language Processing (NLP) applications such as parsing and semantic processing require complete lexicons that provide subcategorization information for a word of interest, i.e. the necessary information about the set(s) of syntactic constituents the word must combine with, in order for its meaning to be fully expressed. Modern Greek presents high flexibility in the allowable orderings of its syntactic phrases as well as rich variety of syntactic constructions, which may function as arguments to verbs. In this paper, we describe a set of machine learning techniques used to automatically extract subcategorization frames of verbs from corpora.
منابع مشابه
Combining Bayesian and Support Vector Machines Learning to automatically complete Syntactical Information for HPSG-like Formalisms
Learning Bayesian Belief Networks (BBN) from corpora and incorporating the extracted inferring knowledge with a Support Vector Machines (SVM) classifier has been applied to the automatic acquisition of verb subcategorization frames for Modern Greek. We have made use of minimal linguistic resources, such as basic morphological tagging and phrase chunking, to demonstrate that verb subcategorizati...
متن کاملLearning Automatic Acquisition of Subcategorization Frames Using Bayesian Inference and Support Vector Machines
Learning Bayesian Belief Networks (BBN) from corpora and Support Vector Machines (SVM) have been applied to the automatic acquisition of verb subcategorization frames for Modern Greek. We are incorporating minimal linguistic resources, i.e. basic morphological tagging and phrase chunking, to demonstrate that verb subcategorization, which is of great significance for developing robust natural la...
متن کاملInfluence of Conditional Independence Assumption on Verb Subcategorization Detection
Learning Bayesian Belief Networks from corpora has been applied to the automatic acquisition of verb subcategorization frames for Modern Greek (MG). We are incorporating minimal linguistic resources, i.e morphological tagging and phrase chunking, since a general-purpose syntactic parser for MG is currently unavailable. Comparative experimental results have been evaluated against Naive Bayes cla...
متن کاملLearning Verb Subcategorization from Corpora: Counting Frame Subsets
We present some novel machine learning techniques for the identification of subcategorization information for verbs in Czech. We compare three different statistical techniques applied to this problem. We show how the learning algorithm can be used to discover previously unknown subcategorization frames from the Czech Prague Dependency Treebank. The algorithm can then be used to label dependents...
متن کاملLexical acquisition from corpora: the case of subcategorization frames in French
We present in this paper a method to automatically acquire a syntactic lexicon of subcategorization frames for French verbs directly from large corpora. The method is evaluated against existing lexical resources: we show that our system is capable of producing new frames that were not previously registered. Lastly, we show that it is possible to induce lexico-semantic classes « à la Levin » (19...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000